DATAX121-23A (HAM) & (SEC) - Introduction to Statistical Methods
Recall that
\[ \begin{aligned} H_0\!: & ~ \mu_\text{Low} = \mu_\text{Medium} = \mu_\text{High} \hspace{2.25em} \\ H_1\!: & ~ \text{At least one} ~ \mu_i \neq \mu_j \end{aligned} \]
We had strong evidence against the null that … in favour of the alternative that … (p-value = 0.01208)
# Fit the means-only model to the data
lm(GillRate ~ Calcium, data = respiration.df) |>
# Decompose the total "variability" between and within groups
anova()Analysis of Variance Table
Response: GillRate
Df Sum Sq Mean Sq F value Pr(>F)
Calcium 2 2037.2 1018.61 4.6484 0.01208 *
Residuals 87 19064.3 219.13
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To use ALL the data to quantify the uncertainty of each group’s sample mean, \(\text{se}(\bar{x}_i)\), we need to meet this assumption in particular
All groups have a similar measure of spread
Conveniently, the statistic we require was already calculated for the F-test: The mean square for residuals, \(MSR\)
\[ MSR = \frac{SSR}{n-k} \]
Consider a randomised experiment with mice where a numeric response variable is of interest and say we split them into four treatment groups
\[ \bar{x}_i - \bar{x}_j \pm \frac{\text{Tukey}^*_{1-\alpha}(k, \nu)}{\sqrt{2}} \times \text{se}_{MSR}(\bar{x}_i - \bar{x}_j) \]
where:
# The emmeans package requires us to state categorical explanatory variable
emmeans(respiration.fit, ~ Calcium) |>
pairs(infer = TRUE) contrast estimate SE df lower.CL upper.CL t.ratio p.value
High - Low -10.33 3.82 87 -19.45 -1.22 -2.704 0.0223
High - Medium -0.50 3.82 87 -9.61 8.61 -0.131 0.9906
Low - Medium 9.83 3.82 87 0.72 18.95 2.573 0.0313
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
emmeans package expects a lm() object, which is why we now “saved” it output to a R object named respiration.fitcontrast column of the R output tells us the order of the pairwise comparison# The emmeans package requires us to state categorical explanatory variable
emmeans(respiration.fit, ~ Calcium) |>
pairs(infer = TRUE) contrast estimate SE df lower.CL upper.CL t.ratio p.value
High - Low -10.33 3.82 87 -19.45 -1.22 -2.704 0.0223
High - Medium -0.50 3.82 87 -9.61 8.61 -0.131 0.9906
Low - Medium 9.83 3.82 87 0.72 18.95 2.573 0.0313
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
Another question we could answer with the confidence intervals is to infer if there is any group(s) that were significantly different from the rest
Recall the output was
# The emmeans package requires us to state categorical explanatory variable
emmeans(respiration.fit, ~ Calcium) |>
pairs(infer = TRUE) contrast estimate SE df lower.CL upper.CL t.ratio p.value
High - Low -10.33 3.82 87 -19.45 -1.22 -2.704 0.0223
High - Medium -0.50 3.82 87 -9.61 8.61 -0.131 0.9906
Low - Medium 9.83 3.82 87 0.72 18.95 2.573 0.0313
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
# The emmeans package requires us to state categorical explanatory variable
emmeans(respiration.fit, ~ Calcium) |>
pairs(infer = TRUE) contrast estimate SE df lower.CL upper.CL t.ratio p.value
High - Low -10.33 3.82 87 -19.45 -1.22 -2.704 0.0223
High - Medium -0.50 3.82 87 -9.61 8.61 -0.131 0.9906
Low - Medium 9.83 3.82 87 0.72 18.95 2.573 0.0313
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates